A Coarse-Grain Hierarchical Technique for 2-Dimensional FFT on Configurable Parallel Computers

نویسندگان

  • Xizhen Xu
  • Sotirios G. Ziavras
چکیده

FPGAs (Field-Programmable Gate Arrays) have been widely used as coprocessors to boost the performance of data-intensive applications [1][2]. However, there are several challenges to further boost FPGA performance: the communication overhead between the host workstation and the FPGAs can be substantial; large-scale applications cannot fit in a single FPGA because of its limited capacity; mapping an application algorithm to FPGAs still remains a daunting job in configurable system design. To circumvent these problems, we propose in this paper the FPGA-based Hierarchical-SIMD (HSIMD) machine with its codesign of the Pyramidal Instruction Set Architecture (PISA). PISA comprises high-level instructions implemented as FPGA functions of coarse-grain SIMD (SingleInstruction, Multiple-Data) tasks to facilitate ease of program development, code portability across different H-SIMD implementations and high performance. We assume a multi-FPGA board where each FPGA is configured as a separate SIMD machine. Multiple FPGA chips can work in unison at a higher SIMD level, if needed, controlled by the host. Additionally, by using a memory switching scheme and the high-level PISA to partition applications into coarse-grain tasks, host-FPGA communication overheads can be hidden. We enlist the two-dimensional Fast Fourier Transform (2D FFT) to test the effectiveness of H-SIMD. The test results show sustained high performance for this problem. The H-SIMD machine even outperforms a Xeon processor for this problem. key words: Configurable Computing, FPGA, SIMD, Parallel Processing, Memory Switching, FFT, Hardware-Software Codesign.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

FFTs in External or Hierarchical

Conventional algorithms for computing large one-dimensional fast Fourier transforms (FFTs), even those algorithms recently developed for vector and parallel computers, are largely unsuitable for systems with external or hierarchical memory. The principal reason for this is the fact that most FFT algorithms require at least m complete passes through the data set to compute a 2 m-point FFT. This ...

متن کامل

A Hierarchical Parallel Processing System for the Multipass-Rendering Method

The multipass-rendering method integrating radiosity with ray-tracing gives one of the best solutions for synthesizing photo-realistic images. However, the method is also computationally expensive. Therefore, parallel processing is the most promising approach to the fast multipass-rendering method. This paper presents a hierarchical parallel processing system for the multipass-rendering method....

متن کامل

Highly parallel processors in military systems - Computers and Digital Techniques [see also IEE Proceedings-Computers and Digital Techniques], IEE

Parallelism has found its way into programmable processors as well as dedicated engines such as FFT and digital filters. However, choices of machine architecture are still open. We have evaluated two contrasting types to test their versatility and to compare their performance on algorithms related to military applications. Fine-grain SIMD, and coarse-grain MIMD machines (MilDAP and Transputer a...

متن کامل

A High-Performance FFT Algorithm for Vector Supercomputers

Many traditional algorithms for computing the fast Fourier transform (FFT) on conventional computers are unacceptable for advanced vector and parallel computers because they involve nonunit, power-of-two memory strides. This paper presents a practical technique for computing the fast Fourier transform that completely avoids all such strides and appears to be near-optimal for a variety of curren...

متن کامل

A GPU Accelerated Aggregation Algebraic Multigrid Method

We present an efficient, robust and fully GPU-accelerated aggregation-based algebraic multigrid preconditioning technique for the solution of large sparse linear systems. These linear systems arise from the discretization of elliptic PDEs. The method involves two stages, setup and solve. In the setup stage, hierarchical coarse grids are constructed through aggregation of the fine grid nodes. Th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • IEICE Transactions

دوره 89-D  شماره 

صفحات  -

تاریخ انتشار 2006